Statistical supports for mining sequential patterns and improving the incremental update process on data streams
نویسندگان
چکیده
Recently the knowledge extraction community takes a closer look to new models where data arrive in timely manner like a fast and continous flow, i.e. data streams. As only a part of the stream can be stored, mining data streams for sequential patterns and updating previously found frequent patterns need to cope with uncertainty. In this paper, we introduce a new statistical approach which biaises the initial support for sequential patterns. This approach holds the advantage to maximiez either the precision or the recall, as chosen by the user, and limit the defradation of the other criterion. Moreover, these statistical supports help building statistical borders which are the relevant sets of frequent patterns to use into an incremental mining process. Theoretical results show that the technique is not far from the optimum, from the statistical standpoint. Experiments performed on sequential patterns demonstrate the interest of this approach and the potential of such techniques.
منابع مشابه
Incremental Mining of Across-streams Sequential Patterns in Multiple Data Streams
Sequential pattern mining is the mining of data sequences for frequent sequential patterns with time sequence, which has a wide application. Data streams are streams of data that arrive at high speed. Due to the limitation of memory capacity and the need of real-time mining, the results of mining need to be updated in real time. Multiple data streams are the simultaneous arrival of a plurality ...
متن کاملIncremental update on sequential patterns in large databases
Mining of sequential patterns in a transactional database is time-consuming due to its complexity. While maintaining present patterns is a non-trivial task after database update, since appended data sequences may invalidate old patterns and create new ones. In contrast to re-mining, the incremental update algorithm proposed which effectively utilizes discovered knowledge is the key to improve m...
متن کاملIncremental Mining of Closed Sequential Patterns in Multiple Data Streams
Sequential pattern mining searches for the relative sequence of events, allowing users to make predictions on discovered sequential patterns. Due to drastically advanced information technology over recent years, data have rapidly changed, growth in data amount has exploded and real-time demand is increasing, leading to the data stream environment. Data in this environment cannot be fully stored...
متن کاملA Single-scan Algorithm for Mining Sequential Patterns from Data Streams
Sequential pattern mining (SPAM) is one of the most interesting research issues of data mining. In this paper, a new research problem of mining data streams for sequential patterns is defined. A data stream is an unbound sequence of data elements arriving at a rapid rate. Based on the characteristics of data streams, the problem complexity of mining data streams for sequential patterns is more ...
متن کاملEfficiently Mining High Utility Sequential Patterns in Static and Streaming Data
High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come contin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intell. Data Anal.
دوره 11 شماره
صفحات -
تاریخ انتشار 2007